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Abstract 

The social behaviour of human agents in Opportunistic Networks has a strong impact on the 
performances of communication protocols since the dynamic topology of such networks is often 
conditioned by the presence of Communities^ i.e., sets of agents that use to meet each other 
much more often than the average intermeeting time. 

Community Detection is thus a preliminary task that may reveal to be crucial in order to 
design and analyse efficient communication protocols for opportunistic networks. 

Our contribution is the first analytical study of this task in dynamic networks. We provide 
a framework that formalizes the distributed version of the Community-Detection Problem over 
a general model of dynamic networks. According to this framework, the problem turns out to 
be a node- coloring task of the dynamic graph. 

Then, we present an efficient provably-good coloring protocol for two classes of dynamic 
random graphs that have been recently adopted as mathematical models of some opportunistic 
networks. 

Keywords. Distributed Computing, Dynamic Graphs, Social Opportunistic Networks. 

1 Introduction 



Recent studies in opportunistic networks focus on the impact of the agent social behavior on some 
basic communication tasks such as routing and broadcasting [U [T3l [T4] . Strong attention on this 
issue has been given in an emerging class of opportunistic networks called Intermittently-Connected 
Mobile Networks (ICMNs) |16j : such networks are characterized by wireless links, representing 
opportunities for exchanging data, that appear sporadically among humans carrying mobile radio 
devices. 

So-called social- aware communication protocols rely on the reasonable intuition that since mo- 
bile devices are carried by people who tend to form communities then members (i.e. nodes) of the 
same community use to meet each other much more often than nodes from different communities. 
Experiments on real-data sets have widely shown that identifying communities can strongly help 
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improving the performance of the protocols [3l [T3l I14j . It thus follows that community detection 
in ICMNs is a crucial issue. 

Many centralized community-detection methods have been proposed in the literature (for a 
good survey see [7]) that may result useful for offline data analysis of mobile traces. However, it 
is a common belief that next-future technologies will yield a dramatic growth of self- organizing 
ICMNs where the network protocols work without relying on any centralized server. In this new 
communication paradigm, it is required that community detection is performed in a fully distributed 
way. 

To the best of our knowledge, only experimental studies are available for this important task. In 
|10j . some greedy protocols are tested on specific sets of real mobility-trace datas. By running such 
protocols, every node constructs and update its own community-list according to the length and 
the rate of the contacts observed so far by itself and by the nodes it meets. So, the protocol exploits 
the intuitive fact that communities are formed by nodes that use to meet often and for a long time. 
This assumption is somehow equivalent to the popular concept of latent classes introduced by 
Wasserman and Anderson and by Snijders and Nowicki in social-network theory (for a good survey 
on this topic see [9]). However, in such heuristic solutions, nodes need to update and transmit 
relatively large lists of node-IDs during all the process. In several real ICMNs, the overhead may 
result too heavy and, moreover, it may not be necessary for improving the performance of the 
protocols. Indeed, in several cases, it suffices to let each node to detect the community of every 
(dynamic) neighbor. As described in the next paragraph, this can be obtained by computing a 
coloring of the nodes that induces the right partition of the node set. 

Distributed Community Detection in Dynamic Graphs. We propose a simple framework 
for defining the distributed Community-Detection problem in general dynamic networks. 

A dynamic graph is a probabilistic process that describes a graph whose topology changes with 
time, so it can be represented by a sequence Q = {Gt = {[n],Et) : t EN} of graphs with the same 
set V = [n] of nodes, where Gt is the snapshot of the dynamic graph at time step t. Thus, the 
presence/absence of every link changes at any time step according to a (probabilistic) process. Our 
framework is inspired by the "statistical" approach in [9] based on latent classes. In particular, 
we formalize the following two key-ideas, (i) Every snapshot of a dynamic graph can be seen as a 
single sample from a specific graph distribution yielded by the probabilistic process: so, any social 
structure of a dynamic network cannot be defined as a property of the single sample, rather it 
must be defined over the graph probability distribution determined by the process, (ii) Extracting 
information about the network structure is thus a statistical task which can be performed by 
observing a sequence of samples from the graph probability-distribution. However, differently from 
typical approaches in statistical inference, this task must be done in a fully-distributed way. 

We can thus define the Community-Detection problem as a node-coloring problem. Let Q be 
a dynamic network whose node set [n] is partitioned into Vi,. . . ,Vi unknown communities. We 
say that a function Z : V ^ {1, ■■■,£} is a good coloring for Q if Z colors each community with 
difi^erent colors, i.e., 

Vi, k G [£] Vn G Vi^v G Vfe : Z{u) = Z(y) ^ i = k 

Then, the goal is to derive efficient distributed protocols for constructing a good coloring of a 
dynamic graph Q structured as unknown communities. Nodes are entities that share a global clock 
and know the number n of nodes but it is not required they have distinct IDs. We again emphasize 
that, initially, each node does not know its own community and it is not able to distinguish the 
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community of its neighbors. At every time step, every node can exchange information with all its 
current neighbors. 

By adopting the concept of latent classes [H [10] , we first consider a simple community- network 
model where, at every time step, every link between two nodes of the same community is present 
with probability p while the link probability goes down to g << p when the two nodes belong to 
different communities. This model of dynamic random network is a variant of the dynamic Erdds- 
Renyi graph model [H [3] with a non-homogeneous edge-probability function. It will be shortly 
named Q{n,p,q) where n is the number of nodes (nodes) while p = p{n) and q = q{n) are the 
edge-probability functions. This model clearly assumes important simplifications that may impact 
several properties of real opportunistic networks: for instance, we have assumed that contacts 
between nodes follow Bernoulli Processes, so the distribution of time between two contacts of a 
pair follows an exponential law. Previous experiments have shown that this assumption holds only 
at the timescale of days and weeks [31 [11]. However, in [3], experimental validations have shown 
that some real ICMNs (e.g. those studied in the Haggle Project [3] and in the MIT Reality Mining 
Project [5]) exhibit some crucial connectivity properties (such as hop diameter) which are well- 
approximated by sparse dynamic Erdos-Renyi graphs. The model Q{n,p,q) thus aims to remove 
the full-homogeneity assumption of the dynamic Erdos-Renyi graph model by introducing the 
presence of unknown communities. 

Another simplifying assumption in the dynamic Erdos-Renyi graph model is time independence: 
the graph topology at time t is fully independent from the previous topology. Edge Markovian 
Evolving Graphs (in short edge-MEG) were first introduced in ^ as a generalization of the dynamic 
Erdos-Renyi graph model that captures the strong dependence between the existence of an edge at 
a given time step and its existence at the previous time step. An edge-MEG is a dynamic random 
graph Q{n,p^,pi, Eq) = {Gt = {[n],Et) : t £ N} defined as follows. Starting from an initial 
random edge set £^0; at every time step, every edge changes its state (existing or not) according to 
a two-state Markovian process with probabilities p^ and p^. If an edge exists at time t, then, at 
time t + 1, it dies with probability p^. If instead the edge does not exist at time t, then it will come 
up at time t + 1 with probability p^. 

We observe that the setting Pi = 1 — p^ yields a sequence of independent Erdos-Renyi ran- 
dom graphs, i.e., dynamic Erdos-Renyi graphs with edge probability p = p^. Edge-MEGs have 
been adopted as concrete models for several real dynamic networks such as faulty networks [6], 
peer-to-peer systems [15], mobile ad-hoc networks [E], and vehicular networks |12j . Furthermore, 
Edge-MEGs have been considered by Whitbeck et al [16] as a concrete model for analyzing the 
performance of epidemic routing on sparse ICMNs and the obtained theoretical results have been 
also validated over real trace datas such as the RoUernet traces [H]. In this paper, we consider 
the Edge-MEG as a mathematical model for ICMNs. The concept of unknown communities in 
edge-MEGs can be introduced similarly to the way adopted for the dynamic Erdos-Renyi graphs: 
here, we have two edge-probability parameter pairs {p^,Pi) and {q^,qi) between two nodes u and 
V depending on whether they both belong to the same community or not. So, if both u and v do 
belong to the same community then the edge {u, v) is governed by the 2-state Markov chain with 
parameters {pi^,P],) otherwise the edge is governed by the 2-state Markov chain with parameter 
{q-^,qi)- We assume that p-j- << q^ and, according to the parameter tuning performed in [16], 
it turns out that the best fitting to real scenarios is achieved by setting p^, (and qi) as absolute 
constants. This is mainly due to the fact that, once a connection comes up, its expected life-time 
does not depend on the size of the network [16] . 
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Our Algorithmic Contribution. We provide an efficient distributed randomized protocol for the 
coloring problem in the 0{n,p,q) model with a constant number i of Communities. According to 
the latent-class concept [9l [inj and the analysis on the clustering-degree of some real opportunistic 
networks performed in [T7j, we here assume that the edge-probability functions p and q are such 
that q = 0{p/n^), where 6 > is an arbitrarily small constant. Our protocol yields with high 

probabilitj{3 (in short w.h.p.) a good coloring in O ^maxjlogn, ^^^}^ time. The bound is tight for 

any p = 0{l/n) while it is only a logarithmic factor larger than the optimum for the rest of the 
parameter range (i.e. for more dense topologies). 

The local coloring rule adopted by the protocol is simple and requires no node IDs. Furthermore, 
differently from the previous solutions [lOj . our protocol preserves the privacy of the nodes. Indeed, 
a node does not need to store or exchange any "personal" information (IDs, contact frequences, 
etc) about the nodes he has met so far: the unique exchanged information is the assigned color. 

Our protocol can be easily adapted in order to construct a good-coloring for the Edge-MEG 
model G{n,p^,p^, q^, q^^, Eq) in the parameter range q-\^ = 0{p^/n''), where b is any positive constant. 
The completion time is w.h.p. bounded by 

([ log ^ 1 \ 
M ■ max < log n, > ) 
I Pt^ )J 

where M is a bound on the mixing time of the two 2-state Markov chains governing the edges of 
the dynamic graph. It is known that (see for example 0) 

/ r 1 1 

M = O { max < , , log n 

\ [Pt+Pi Qt + Ql 

Observe that, when p^ and q^ are some arbitrary positive constants and p^ = Q{l/n) (this case 
includes the "realistic" range derived in |16]). then M = O(logn) and the bound on the completion 
time becomes 0(log^?i). This bound is only a logarithmic factor larger than the optimal coloring 
time in the case of sparse topologies, i.e., when p-^ = 0(l/n). 

We run our protocol over hundreds of random instances according to the G{n,p, q) model with 
n varying from 10^ to 10^. Besides a good validation of our asymptotical analysis, further pos- 
itive features of the protocol are shown. Our protocol is indeed tolerant to non-homogeneous 
edge-probability functions. In particular, the protocol almost-always returns a good coloring in 
Bernoullian graphs where the edge probability is not uniform, i.e., for each pair (n, u) of nodes in 
the same community, the parameter Pu,v is suitably chosen in order to yield irregular sparse graphs. 
A detailed description of the experimental results can be found in the Appendix (Section |A|). 



2 The protocol and its analysis 

In this section, we first consider the dynamic graph G{n,p,q) and, for the sake of clarity, we 
assume the following restrictions hold: the parameter p is known by every node; there are only 2 
communities Vi and V2, each of size n/2 (n is an even number); the coloring process starts with 
(exactly) two source nodes, si G Vi that is zi-colored and S2 € V2 that is Z2-colored with zi ^ Z2- 
The parameters p and q belong to the following ranges 

— ^ P ^ — , for some constant d> and q = O (-^^ , for some constant 6 > (1) 

n n \n° J 

^As usual, we say an event holds with high probability if it holds with probability not smaller than 1 wuj- 
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Such restricions make the description much easier, thus allowing us to focus on the main ideas 
of our protocol and of its analysis. Then, at the end of this section, we will show how to remove 
the assumption on the presence of the two sources and, in Section [3l we will get the general result 
stated in the introduction. 

The protocol relies on the following simple and natural idea. Starting from two source nodes 
(one in each community), each one having a different color, the protocol performs a color spreading 
by adopting a simple coloring/broadcasting rule (for instance, every node gets the color it sees most 
frequently in its neighbors). Since links between agents of the same community are much more 
frequent than the others, we can argue that the good- coloring will be faster than the bad- coloring 
(in each community, the good coloring is the one from the source of the community while the 
bad coloring is the one coming from the other source). However, providing a rigorous analysis of 
the above process requires to cope with some non-trivial probabilistic issues that have not been 
considered in the analysis of information spreading in dynamic graphs made in previous works 
[21 O [6]. Let us consider any local coloring rule that (only) depends on the color configuration 
of the (dynamic) neighborhood of the node. At a given time step, there is a subset Ic ^ [n] of 
colored nodes and we need to evaluate the probabilities Pg (Ph) that a non-colored node gets a 
good (bad) color in the next step. After an initial phase, there is a non-negligible probability that 
some nodes will get the bad color. Then, such nodes will start a spreading of the bad coloring at 
the same rate of the good one. It turns out that the probabilities Pg and Pf, strongly depend on the 
balance between the sizes of the subsets of well-colored nodes (and the badly-colored ones) reached 
in the two communities, respectively. Keeping a tight balance between such values during all the 
process is the main technical goal of the protocol. In arbitrary color configurations over sparse 
graph snapshots, getting "high-probability" bounds on the rate of new (well/badly) colored nodes 
is a non-trivial issue. Moreover, it is not hard to show that, given any two nodes VjW £ [n] \ Ic, the 
events "f will be (well/badly-)colored" and "it; will be (well/badly-)colored" are not independent. 
As we will see, such issues are already present in the "restricted" case considered in this section. 
A first important step of our approach is to describe the combination between the coloring process 
and the dynamic graph as a finite-state Markovian process. Then, we perform a step-by-step 
analysis, focusing on the probability that the Markovian Process visits a sequence of states having 
"good-balance" properties. 

Our protocol applies local rules only depending on the current node's neighborhood and on the 
current time step. The protocol execution over the dynamic graph can be represented by the 
following Markovian Process: for any time step t, we denote as ( A;j*\ fcg*^ /if^ /i2*^; -E* ) the state 



reached by the Markovian Process where kf^ denotes the number of nodes in the i-th community 

colored by color Zj at time step t and hf-' denotes the number of nodes in the i-th community 
colored by color Zj at time step t, for i,j = 1,2 and j ^ i. In particular, the Markovian Process 
works as follows 



protocol 



The major profict of this description is that, if we observe the process in any fixed state, then 
it is not hard to verify that, in the next time step, the events {"node v gets a good/bad color". 
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V S V}, become mutually independent. This will allow us to preserve good-balance properties for 
a sufficiently-long sequence of states visited by the Markovian Process, w.h.p. 
The protocol works in 5 consecutive temporal phases: the goal of this phase partition is to control 
the rate of new colored nodes as function of the expected values reached by the r.v. , hf'' (at 
the end of each phase). Indeed, when such expected values reach some specific thresholds, the 
protocol and/or its analysis must change accordingly in order to keep the coloring configuration 
well-balanced in the two communities during all the process. 

At any time step t, we denote, for each node v £ Vi, the number of zi-colored neighbors of v as 
Nf{t); iV|(t) is defined similarly as the number of 2;2-colored neighbors of v. Given a node v £ V, 
the set of its neighbors at time t will be denoted as Tt{v). For the sake of brevity, whenever possible 
we will omit the parameter t in the above variables and, in the proofs, we will only analyze the 
coloring in Vi, the analysis for V2 being the same. 



2.1 Phase 1: source coloring 

The phase runs for ti = ci log n time steps, where ci > is an explicit constant that will be fixed 
later. In this phase, only the neighbors of the sources will decide their color. The goal is to reach a 
state such that w.h.p. ki = B(logn) and hi = {i = 1, 2). For any non-source node v, the coloring 
rule is thus the following. 

• Let i S {1, 2}; v gets color Zi if there is a time step t < ti such that Sj S '^tiv) and, for j ^ i 
and for all t such that 1 ^ t ^ ri, it holds that Sj ^ Tt{v)', 

• In all other cases, v gets no color. 

Thus, in order to analyze the process, it is appropriate to define the following r.v.s counting the 
colored nodes at the end of Phase 1. 

• The variable Xf = 1 iff t; gets color zi, and the variable Xi = 1 + X] -^i describes the total 
number of the zi-colored nodes in Vi. 

• The variable = 1 iS v gets color Z2 and the variable Yi = Yl ^1 describes the total 
number of the 22-colored nodes in Vi. 

The key property of the protocol in this phase is stated in the following theorem. 

Theorem 1 Let di > be any (sufficiently large) constant. Then, a constant ci > can be fixed 
so that, at time step ri = ci logn the Markovian Process w.h.p. reaches a state such that 



i,(n) , (n) p 

1 ' 2 



r^pn log n,Adipn logn and h^i^\h^2^^=0 (2) 



Proof. The above theorem follows from the fact that, at the end of this phase, a node gets the good 
color with probability Q{pTi) and, w.h.p., no node will get the bad color. 
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Claim 1 Let ti be such that pri ^ and ti = o . Then, starting from the initial 
state (ji^^^ = l,k^^ = l,hf'^ = 0,h''2^ = 0;£'o^, at time step ti w.h.p. it holds that 

■^npn i^Xi,X2 ^ 4npTi 
lb 

Proof. We first bound from below the number of zi-colored nodes at the end of Phase 
1. Notice that P {X^ = 1) = (1 - (1 - (l and we can apply Lemma [10] to 

each factor on the right side, thus getting: 

P {X! = 1) = (1 - qr (1 - (1 - pD > Pri [(1 - qri (1 + 2q)) (1 - pn)] > ^ 

where in the last inequality we used lim„_j.oo [(1 — qri (1 + 2q)) (1 — pri)] = 1. 
The above inequality easily implies that 

that is 

E[x,l>!fi 

Since, by fixing any initial state (j^f''^ , k2^^ , h^^^ , h^2^ ; Eq^ , the random variables X\ are 
independent, we can apply the Chernoff Bound (jl7p with 5 = ^- Then, 

P (X, ^ ^) ^ e-i^"^- 

By hypothesis, we have that npri ^ logn, so, w.h.p. it holds that 

npTi 



Xi ^ 



16 



A similar analysis, based on LemmadOland Chernoff bound (jlSp . yields the stated upper 
bound on the number of zi-colored nodes at the end of the Phase 1, that is, w.h.p. it 
holds that 



Xi ^ AnpTi 

□ 

Claim 2 Let ti ^ \ he such that qri = O (prr) for any e > 0. Then, starting from 
the initial state (^k^^^ = 1, k^^ = 1, /i^*^^ = 0, h^^ = 0; £"0^ , at time step ri it holds w.h.p. 
that Yi = 0. 

Proof. A sufficient condition for the event li = is that no edge between any node in 
Vi and S2 occurs at any time step of Phase 1. Hence, by Lemma \T0\ 

P (Yi =0)^(1- = (1 - g)— ^ 1 - 2q\Vi\n 

Since qri = O(^^iqrj), the lemma is proved. 

□ 
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Claims [T] and [2] easily imply Theorem [TJ 

□ 

2.2 Phase 2: fast coloring I 

This phase of the Protocol aims to get an exponential rate of the good-coloring inside every com- 
munity in order to reach, in T2 = O(logn) time, a state such that the number of well-colored nodes 
is bounded by some root of n and the number of badly-colored ones is still 0. 
We consider the Markovian Process when, at the generic step t of this phase, it is in any state 
satisfying the following condition: 



"'1 5 "^2 ^ 



—pn log n, dipn^'^"' log n 
16 



and /if \ 4*^=0 (3) 



Differently from Phase 1, nodes can get a color at every time step according to the following rule: 
for Ti < t < Ti + T2, at time step t of Phase 2 every uncolored node v 

• gets color zi at time t -M iff N^{t) > and iV|(t) = 0, 

• gets color Z2 at time t + 1 iS N^{t) > and Nf{t) = 0, 

• gets no color at time t -|- 1 otherwise. 

For each time step t, ti < t ^ ti + T2, we thus define the following binary random variables 

• Xf{t) = 1 if[ V £ Vi gets color zi at time t + 1, and Xi{t) = Yl ^1 (*)• 

veVi 

• Y^{t) = liSv£Vi gets color Z2 at time i -M, and Yi{t) = Yl ^lit) 

In the next theorem, we assume that, at time step ri (i.e. at the end of Phase 1), the Markovian 
Process reaches a state satisfying Cond. ([2]). In particular, we assume that kj^ ^ fcp, where 
— = Y^pnlogn. Thanks to Theorem [H this event holds w.h.p. In what follows, we will make use 
of the following function 



F{n, k) — 2 max ■ 



log n polylog n 



Theorem 2 For any r] > 0, positive constants a and cj) can be fixed so that, at the final step of 
Phase 2 



n 

■ n 



1 n^s'n] 



it holds w.h.p. that 

for i=l,2, n'* ^ A:f'^ ^ n^log^n, and /if'^ = 0. (4) 
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Proof. We prove the following key-fact: if ki = 0{n^) {i = 1,2) for some constant a < 1, then it 

holds w.h.p. X = [{Iziz o(l))pn/2] ■ ki and Y = 0. From such bounds, we then derive the recursive 

equations for kf^ yielding the bounds stated in the theorem. 

Remind that q = 0{p/n^) and consider any positive constant a such that a < b. 

We first provides tight upper and lower bounds on the number of new colored nodes after one step 

of the protocol. 

Claim 3 For i = 1,2, it holds w.h.p. that 

I /logn \ / polylogn\ np (t) I /logn\ / polylogre\ np (t) 

Y,{t) = 0, 



Proof. Observe that P {Yi (t) = 0) is lower bounded by the probability that in Ef there 
is no edge between any node in Vi and any node in V2 which is already colored Z2. By 
the hypothesis ([3]) and the conditions on p and q, we can thus apply Lemma [TOl and get 

P (Yi = 0) ^ (1 - qf^^''^' ^ 1 - 2,1^114*) ^ 1 - 



proving that w.h.p. Yi{t) = 0. Again, thanks to Condition ([3]) and the conditions on p 
and q (] 
We get 



1 — (1 — p) 1 I (1 — (7) 2 . 



P = 1) ^ k?p (1 - f 1 - 24*^.) ^ kf^P ( 1 - 



P (xr = 1) ^ kPp (1 + 2p) ^ kPp ( 1 + P^i^ 



We can thus bound the expected number of new well-colored nodes E [Xi] = {\Vi\ - kf ) P {XI 
V^Pj^it) ( 1 _ P°lylogn\ ^ np^(,) / ^ polylogn\ 



By applying the Chernoff Bounds ([IT]) and (([T8])) with S = v/^, we get 



P < 1 



p Ui ^ 1 + 



'log n 

IF, 

/log n 



^ _ polylogn \ npj^it) 
ni-" / 2 ^ 



polyiogn\ np (t) 



lognnp.wA P0lyl0gn\ 

2,,(t) 2'«1 ^T^H 1 1 

e 1 ^ ^ ^ — 5 

°Knnp,(t)/ P0lyl0gn \ 



1 
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This implies that, w.h.p., 



'log n 
k 



Let us observe that k. 



(t+i) 



□ 

/Cj^*^ +Xi{t). So, Claim [3] easily implies the following recursive bounds 



Claim 4 For i = 1,2, it holds w.h.p. that 



1 + 



np 



l-F(n,A:f ) 



l + F(n,A:f ) 



and h'^+'^ 



0. 



We can now analyze the spreading of the good coloring. The idea is to derive the closed formula 
corresponding to the recurrence relation provided by Claim U and analyze it in two different spans 
of time: in the first span, we let ki increase enough so that, in the second span, we can apply a 
stronger concentration result. Recall that, thanks to Theorem[TJ Phase 2 starts with the Markovian 
Process in a state satisfying Condition ^ that implies Condition ([3|). Moreover, we will fix the 
final time step T2 so that Condition ([S]) has been holding for all time steps of Phase 2: this implies 
that we can apply Claim d] for all such steps. 
Let t* ^ Ti be defined as 



iog-M(i + 



np 



l-F(n,fc^))))lo, 



log n 



k 



in) 



+ Ti. 



Since t* — ti G O(logn), thanks to Lemma [T2t we can unroll (backward) the recursive relation from 
time t* to time ri and get 



np\t*~Ti 



1 - F{n, k\ 



(n)^ 



t*-Tl 



k 



(n) 



^ k 



np\t*~'^i 



1 + F{n, k\ 



t*-Tl 



k 



(n) 



(5) 



We observe that the value of k^^ ^ j^pnlogn can reach any arbitrarily large constant by tuning 
the constant di in Theorem [U so, F{n,k^j^^^) can be made arbitrarily small. From this fact and 
Eq. m we have that kf ^ € [log^ n, log'^^^ n] , where fi can be made arbitrarily small by decreasing 
F{n,k^^^) (i.e. by increasing di in Theorem [T]). Notice that, at any time step t ^ t*, Condition 
([3]) is largely satisfied. 

We now unroll the recursive relation from time T2 to time t* and get 



1 + 



1 - F{n,kf^) 



T2-t* 



np\'r'2-t* 



l + F{n,k\ 



T2-t* 



in 



(6) 



We observe that, with a suitable choice of the positive constant (j) G (0, 1), for 

np^ 



r2 = log-^ (1 + f ) log 



n 



)log n 
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it holds that 



1 - F{n,k\ 



T2~t* 



and 



1 + F(n, k\ 



T2-i* 1 
< - 



By replacing T2 into Eq. [6l with a suitable choice of > /i (remind that can in turn be made 
arbitrarily small), we finally get ^ ^ n° log''n. Again, observe that, for all time steps 

t ^ T2, Condition ([3]) is largely satisfied: this implies that at each of these steps we were able to 
apply Claim m 

As for the bad coloring, observe that Claim [5] guarantees (w.h.p.) h^\h2^ 



0; then, from Lemma 



\T2\ it holds w.h.p that h^^^ = and h 



0. 



□ 



2.3 Phase 3: fast coloring II 

In this phase nodes apply the same rule of Phase 2 but we need to separate the analysis from the 
previous one since, when the "well-colored" subset gets size larger than some root of n, we cannot 
anymore exploit the fact that the bad coloring is w.h.p. not started yet (i.e. h = 0). However, we 
will show that when the well-colored sets get size 0(n/polylogn), the bad-colored sets have still 
size bounded by some root of n. We assume that, at the end of Phase 2, the Markovian Process 
reaches a state satysfying Cond. (jU of Theorem [2] and that , at the generic step t of current phase, 
it is in any state satisfying the following condition 



for i = 1,2 : # G 



n 



log^ n 



and hf = 0{n''^), where oi < 02 < 1 (7) 



Theorem 3 For any constant rj > 0, constants oi < 1 and 7 > can be fixed so that at the final 
time step of Phase 3 

for i = 1,2, it holds w.h.p. that 



log'^n log'^ ''n 

Proof. Let X and Y be the r.v.s defined in the previous subsection. The presence of the bad coloring 
changes the bounds we obtain as follows. At time step t -|- 1, as long as ki, hi = 0(n/polylogn), we 
prove that X = [(1 ± o(l))pn/2] • ki and y = [(1 it o{l)){phi + g/c2)]n/2. From the above bounds, 
we then determine two time-recursive bounds on the r.v. kj and hj that hold (w.h.p.) for any t 
s.t. kl,hl = 0(?i/polylogn). Then, thanks to the hypothesis q = 0{p/n^) and to the fact that the 
Markovian Process starts Phase 3 from a very "unbalanced" state [ki = r2(n") and hi = 0), we 
apply the recursive bounds and show that a time step T3 exists satysfying Eq. [8l 
We start by providing tight upper and lower bounds on the number of the well-colored nodes at 
a generic step of Phase 3 (some of the proofs are similar to those of Phase 2, so they will only 
sketched) . 
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Claim 5 A constant C > exists such that, for i = 1,2, it holds w.h.p. that 

Sketch of Proof. By neglecting the contribution of /i2 , from the facts pkf^ , qk2^ , phf^ = 
o(l) and Lemma [T0\ we have that 

/ ;(*)\ ; (*) !,(*) 



^ pA;f' (l - pkfA (l - 2qkf) (l - 2phf 



Observe that 

_ pfcW) (l - 2qkf) (l - 2phf) ^ (l - e 



log^ n 



then 

p (xr = 1) ^ pkf {i-e (^)) (10) 

We now provide an upper bound on P {X\ = 1). From the Union Bound and Lemma 
[TOl we get 

P {XI = 1)^ pkf (1 + 2p) + qhf (1 + 2q) ^ pkf^ (l + Q ) ■ (H) 



As usual, we exploit Eq.s [10] and [TT] to bound the expectation 

E [Xi] = (\V,\- {kP + hP)) . P = 1) 



For some constant C > 0, w.h.p. it thus holds that 



2 \ logn / 2 \ logn / 

We can use the Chernoff Bounds (|17l and 1181 with 6 = l/log?i), to get that, for some 
constant C > 0, w.h.p. 

2 1 V log™/ 2 ^ V logny ^ ' 

From the above inequality, it follows that 



(i + f)(i 



2 + np log n J \ 2/y 2 + np log n 
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Since 2+np ^ ^® bounded by a constant, for the sake of simplicity we can just re-define C 
as any fixed constant such that 

□ 

Claim [5] implies the following properties of the well-coloring for any state within Phase 3 (including 
the final one at time T3). 

Claim 6 A constant C > exists such that, for i = 1,2, it holds w.h.p. that 

Sketch of Proof. Since Claim [5] holds as long as ki ^ ^ , from Lemma [12] we get the 
claim by applying the same unrollement argument shown in the previous phase. □ 

We now exploit the above lemma to provide a bound on the number of bad-colored nodes at the 
end of Phase 3. 

Claim 7 For any positive constant 7, a constant ai, with 1 — a < ai < 02, can be fixed 
so that by choosing the final time step of Phase 3 

it holds w.h.p. that, for i = 1,2 and for all t ^ T3, hf^ ^ n"^. 

Sketch of Proof. In order to bound the rate of hf \ we consider the r.v. when the 
Markovian Process is in a generic state satisfying Condition ([7]). Thanks to Theorem 
[21 we know this (largely) holds for the first step of Phase 3 and, by the choice of t^, we 
will see this (w.h.p.) holds for all t ^ T3 by induction. 

By neglecting the contribution of k2, we have that 

P{Y,^ = l)^(l-{l-pp jil-qp {1-pf^ 

Since phf\qh2\pkf^ = o(l), we can apply Lemma [TOl to each factor on the right side, 
thus obtaining 

P (y^^ = 1) ^ ph\ (1 - p/ij*)) (1 - 2g4*0 f 1 - "^Pk? 
^ ph\ (l-Q 



1 



log^ n 
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We now provide an upper bound to P {Y^ = 1). By the Union Bound and Lemma [TO] 
we get 



^ p/if ^ (1 + 2p) + gfef (1 + 2q) 
^ (phf +qkf') (l + 2p) 

As for the expected value of new bad-colored nodes, for some constant ^ > 0, it holds 
that 

^hf f 1 - E [y^] ^ (^/^w + ^^kf) f 1 + ^1 (13) 



From the Chernoff Bound and Eq. (I13p . it follows that w.h.p. hi will not "jump" from 
a sublogarithmic value to a polynomial one: in other words, in the first time that T will 
be at least log"^ n, we have that /if = O(polylogn). 



for each t^T'm. Phase 3 w.h.p. we have 



Hence, again from the Chernoff Bound and Eq. ()13p . setting 5 = .j^^^jw, we see that 



(14) 



logn 



In Eq. (jl4p . we can bound the term using Claim [6] and Theorem [2l obtaining for 
some positive constant c 

At) /"i + ^V~"' 

^ < ("i + npy-^^ \ ^ log"; 1 < f 1 + I^V""'" . c 



2n" V 2 / 2n°' V 2 

Therefore we can use Eq. ()14p to get that w.h.p. 

((I. f),.<".(i.f )'-".. (15) 

Hence unrolling hf^ until time T, for some positive constants ci and C2, Eq. ( [T5]l 
becomes (keeping high probability thanks to Lemma [T2]) 



<c,(l + ^)'"'-^r'+<:2logn(l + f)"" 

and the last side turns out to be 0(n^~'^ -polylogn) when t + 1 = rs, proving the lemma. 

□ 
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The bound claimed for hi follows from Claim [71 and the bounds claimed for ki follow from Claim 
[6] for t = T3, thus proving Theorem [3] 

□ 

Theorems [2] and [3] guarantee a very tight range for the r.v. ki and ^2 at the final step of Phase 2 
and 3, respectively. As we will see later, this tight balance is crucial for removing the hypothesis 
on the existence of the two leaders. 

2.4 Phase 4: controlled saturation 

At the end of Phase 3, the Markovian Process w.h.p. reaches a state that satisfies the properties 
stated in Theorem [3l The goal of Phase 4 is to obtain a (large) constant fraction a (say, a = 3/4) 
of the nodes of each community that get the good color and, at the same time, to ensure that the 
number of bad-colored nodes is still bounded by some root of n. We cannot guarantee this goal 
by applying the same coloring rule of the previous phase: the number of bad-colored nodes would 
increase too fast. The protocol thus performs a much "weaker" coloring rule that is enough for the 
good coloring but it keeps the final number of bad-colored nodes bounded by some root of n. 
The fourth phase consists of three consecutive identical time- windows during which every (colored 
or not) node v GV applies the following simple rule: 

For any t £ [T3 -|- 1 , rs -|- T4 = C4 log n], v looks at the colors of its neighbors at time t and: 

• If I) sees only one color (say, z) for all the window's time steps, then v gets color z; 

• In all the other cases (either v sees more colors or v does not see any color), v keeps its color 
(if any) or it remains uncolored. 

The above Protocol window is repeated 3 times for a specific setting of the constant C4 that will be 
determined in the proof of Theorem [H We assume that the Markovian Process terminates Phase 
3 reaching a state that satisfies Eq. O thanks to Theorem O we know this holds w.h.p. 

Theorem 4 Let a be any constant such that < a < 1. Then, constants C4 and oi < 1 can be 
fixed so that, at time step T4 = T3 + 3T4, the Markovian Process w.h.p reaches a state such that, for 
i = l,2, 

k^^ ^ an , and h^^ ^ n""^ polylogn. (16) 

Proof. 

We first bound the number of new bad-colored nodes. 

Claim 8 For any constant C4, at time step = 'iT^^ + Tj, = 3c4 logn-l-rs, the Markovian 
Process is w.h.p. in a state such that hj* = 0{n°'^ polylogn), for i = 1,2. 

Sketch of Proof. Let hi = h^^'^^ and k2 = k^^\ At the end of the first window, for any 
node f G Vi, it holds that 




^ Ti{phi + qk2) 
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So, since it holds ai > 1 — a > 1 — 6, we get E [Yi] ^ 2 nphiT^ = O {n'^^ polylogn). 
By repeating the same reasoning for the other 2 windows and by applying the Chernoff 
bound, the thesis follows. 

□ 

For the sake of brevity, we define ki = k^^\ k2 = k^^\ and h = h^^\ 

Let us consider a node v £ Vi at the end of the first time window of Phase 4. For some constant 
7 > 0, it holds that 



k2T4 



\ logi-^ny^ 

Since ^ ^ p ^ by computing the expected value of the sum of all X{"s and by applying the 
Chernoff bound, we get that the number of well-colored nodes in Vi at the end of the first time 
window of Phase 4 is w.h.p. 



k^^ ^ d4 



n 



log n 

where d/^ = ^4(04) is a positive constant that can be made arbitrarily large by increasing C4 in 
T4 = C4 log n. 

We have thus shown that, after the first window, the number of well colored nodes inside each 
community is increased by a factor di^logn. We can then repeat the same analysis for the second 
and the third windows (which are necessary when p = o(log n/n)). Let us consider the sparsest 
case p = 1/n (the other cases are easier). In this case, at the end of the third window, it can be 
easily verified that: 



P (X-, = 1) ^ (1 - (1 (1 polylogn) T4(i _ ^)nT4 




The last bound can be thus made arbitrarily close to 1 by increasing the constant C4. Hence, w.h.p. 

where constant a can be made arbitrarily close to 1 by suitably choosing the constant C4 in T4 = 
C4 log n. 

As for the bad coloring, the thesis follows from Claim [HI 

□ 



2.5 Phase 5: majority rule. 

Theorem S] states that, at the end of Phase 4, the Markovian Process w.h.p. reaches a state where 
a (large) constant fraction of the nodes (say, 3/4) in both communities is well-colored while only 
0(n"^ polylogn) nodes are bad-colored. We now show that a further final phase, where nodes apply 
a simple majority rule, yields the good coloring, w.h.p.. Every node v £ V applies the following 
coloring rule: 
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• For every t E [1,75 = cslogn], every node v observes the colors of its neighbors at time t 
and, for every color Zi {i = 1,2), v computes the number /* of its neighbors colored with Zi. 

• Then, node v gets color zi if X^^gji ^^.j /* ^ J2te[i rs] /21 otherwise v gets color Z2. 

Let us assume the Markovian Process starts Phase 5 from a state satisfying Eq. [16] (say with 
constant a = 3/4). 

Theorem 5 A constant C5 > can be fixed so that, at time = T4 + C5 logn, every node of each 
community is well-colored, w.h.p. 

Sketch of Proof. Let us consider a node v G Vi and, for every time step t of Phase 5, define the 
r.v. XI' counting the number of its Zi-colored neighbors and the r.v. counting the number of 
its 2;2-colored neighbors in Ef. Then, define the two sums 

tG[T4 + l,...,T5] te[Ti + l,...,T5] 

Let us also define the subset 

C^* = {v £ Vi \ V is zi-colored at time T4} 
Thanks to Condition [16] (with constant a = 3/4), it holds that 

I 1/41^1 8 

From the above inequality, the expected values of r.v.s X^ and can be easily bound as follows 
B[X,] ^ ^ P((n,7;) G St) ^ ^pnrs , and 

telT4 + l,...,Ts] U&G^A 



te[T4 + l,...,T5] \n0G^4 UGV2 / 



Finally, observe that X^ and Y^ are sums of independent binary r.v.s (thanks to the Q{n,p,q) 
model). Since p^ 1/n and r5 = r4 + C5logn, we can thus choose a suitable constant C5 > and 
apply the Chernoff bound to get the thesis. 

□ 



2.6 Overall completion time of the Protocol and its optimality 

When p and q satisfy Cond. ([1]), we have shown that every phase has length O(logn): the Protocol 
has thus an overall completion time O(logn). In Section [3] we will show that for p = o(l/n) the 
length of each phase must be stretched to 6 

As for the time optimality of our protocol we can state the following 

Theorem 6 If p = 0(l/n) and q = 0{p/n^) for some constant 6 > 0, then any good-coloring 
protocol requires Viilogn) expected time. 
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Sketch of Proof, if p = 0(l/n), starting from the initial random graph-snapshot, it is easy to to 
show that there is non-negligible probability that some node will be isolated for T{n) time steps 
where r(n) is any increasing function such that t = a ^^^^^ ■ It is clear that, in this time window, 
such isolated nodes cannot get a good color (w.h.p.) 

□ 

3 More General Settings 

In this section, we show some relevant generalizations that can be efficiently solved by simple 
adaptations of our protocol and/or its analysis. 

- Removing the presence of two leaders. So far we have assumed that, in the initial state of 
the coloring process, there are exactly two source nodes, one in each community, which are colored 
with different colors. This assumption can be removed by introducing a preliminary phase in which 
a randomized source election is performed and by some further changes that are described below. 
In the first step, every node, by an independent random choice, becomes a source with probability 
for a suitable constant d > 0. This clearly guarantees that, in every community, there are 
w.h.p. 0(logn) sources. Then, every source Si randomly chooses a color Zi G [n^]. This implies 
that the minimal color zi in the first community and the minimal color Z2 in the second community 
are different w.h.p.. 

Let a and b be the number of sources chosen in Vi and V2, respectively, and define i = a + b. We 
summarize the above arguments in the following 

Fact 1 Two positive constants rji < rj2 exist such that at the end of the first step w.h.p. it holds 
that r]i log n ^ a,b ^ i]2 log n and z\ 7^ 2:2 • 

The generic state of the modified Markovian Process is represented by the following set of random 
variables: 

(o, 6; . . . , A:^^, . . . , /i^, /c^, . . . , A;^, /i^, . . . , /i^) 

where fc*- equals the number of nodes in Vi colored by the same (good) color as the jth source of 
Vi while h^j equals the number of nodes in Vi colored by the same {bad) color as the jth source of 
Vr with r ^ i. At every time step t, for any v € [n] we define the r.v. Nj{t) as the the number of 
u-neighbors colored with color zj at time t. 

The first three phases of the Protocol are identical to the 2-source case since the impact of the 
presence of an O(logn) colors in each of the two communities remains negligible till the overall 
number of colored nodes in each community is 0(n/log^n). By applying the same analysis of the 
2-source case, at the end of Phase 3, we can thus show that the Markovian Process w.h.p. reaches 
a state having similar properties to those stated in Theorem [3l We remind that p and q belong to 
the ranges in Cond. ([T|). 

Theorem 7 We can choose a suitable = T2 + (03 -|- o(l))logn so that, at the end of Phase 3, 
the Markovian Process w.h.p. reaches a state in which for i = 1,2 it holds 

Vj G [a] — ^ ^ /cj ^ , Zv ! ^ [^] hi = polylogn 
log n log ' n 

where rj is a constant that can be made arbitrarily small. 
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We need to stop at a "saturation level" 0(n/log^~^n) for every good color, since we want to 
guarantee (w.h.p.) that the minimal color infects at least n/polylogn nodes. Then, as in the 
2-source case, the protocol starts a controlled saturation phase (i.e. Phase 4) that consists of (at 
most) 4 consecutive time windows in which every node applies the same following minimal- color 
rule: 

For i = 1 to T4 = C4 log n time steps, v observes the colors of its neighbors and gets the 
minimal color z among all the observed colors. 

Thanks to the above rule, the size of the nodes colored by the minimal good-color increases by a 
logarithmic factor at the end of each of the four windows. This fact can be proved by using the 
same arguments of the proof of Theorem HI 

It thus follows that, at the end of Phase 4, the number of nodes colored with the good minimal 
color is at least a constant (say 3/4) fraction of all the nodes of the community. Then, as in the 
2-source case, every node can apply the majority rule in order to get the right color w.h.p. 

- The case p-unknown. Our protocol relies on the fact that nodes know the parameter p = 

the length of the protocol's phases are functions of p. So an interesting issue is to consider the 

scenario where nodes do not know the parameter p (i.e. the expected degree). Thanks to edge 

independence, the dynamic random-graph process can be seen by every node as an independent 

sequence of random samples. Indeed, at every time step t, every node can store the number \N'"(t)\ 

of its neighbors and it knows that this number has been selected by n — 1 independent experiments 

according to the same Bernoulli distribution with success probability p = -. The goal is thus to use 

1 ^ 

such samples in order to get a good approximation of p. If p ^ -, by using a standard statistical 
argument, every node w.h.p. will get the value of p up to some negligible factor in O(logn) time. 
Let's see this task more formally. 

For clogn time steps (where c is a constant that will be fixed later), every node stores the values 
|iV^'(l)|, 17^^(2)1, |7V''(clogn)|; then it computes S = |7V''(1)| + . . . + |A^^(clogn)|. Since S is the 
sum of clogn • (n — 1) Bernoulli random variables of parameter ^, we get a binomial distribution 
with mean E [S] = dclogn (l — ^) Then, every node uses the estimator D{S) = ^^/^ — j-^ to 

guess d. We can use the Chernoff bound in order to determine a confidence interval for D{S), as 
follows 

/ 77 — 1 77 — 

P(d^ [D{S)-S,D{S) + S])=P{ Si [^[S]-5c\ogn , E [5] + 5c log n 

\ n n 

P (5 < E [5] (1 - ) + P (5 > E [5] (1 + ) < e-^^I^l + < 4 " 

It thus follows that, for any d ^ 1, we can choose 6 = \fd and c sufficiently large in order to get 
a good confidence interval for all nodes of the network. This obtained approximation suffices to 
perform an analysis of the protocol which is equivalent to that of the case p-known. 

- Edge Markovian Evolving Graphs. Let us consider an Edge- MEG G{n,p^,pi,q^,q^, Eq) 
defined in the introduction and assume that ^ P-t/n-. If < p^,Pi,q^,qi < 1, it is easy to see 
[6] that the (unique) stationary distribution of the two corresponding 2-state edge-Markov chains 
(inside and outside the communities, respectively) are 

= f^^,^^) and = f^^,^^] 
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It thus follows that the dynamic graph, starting from any Eq, converges to the (2-communities) 
Erdos-Renyi random graph with edge-probability functions 



P = 



Pt+Pi 



(Inside Communities) and q 



q^ + ii 



(Outside Communities) 




consecutive time steps. If we observe any event at time t related to E^ (such as the number of 
well-colored nodes) then -Et+i is not anymore random with the stationary distribution. 
It thus follows that we need to change the way the protocol works over the dynamic random graph. 
Let M = max{M*", M°"*, log n}; then by definition of mixing time, starting from any edge subset 
Et at time t, at time t + A with some A = 0(M), if u,v £ Vi or u,v £ V2 then edge (u,v) exists 
with probability p it otherwise it exists with probability q± In other words, whathever the 
state of the coloring process is at time t, after a time window proportional to the mixing time, the 
dynamic graph is random with a distribution which is very close to the stationary one. We can 
thus modify our protocol for the dynamic Erdos-Renyi graph model Q{n,p,q) in order to "wait 
for mixing" . Between any two consecutive steps of the original protocol there is a quiescent time- 
window of length Q{M) where every node simply does nothing. Then, the analysis of the protocol 
over Q{n,p^,pi, q^, qi, Eq) is similar to that in Section [2] working for the dynamic Erdos-Renyi graph 
Q{n,p, q). We can thus state that, under the condition q^ ^ 0{p^/n^) for some constant 6 > 0, this 

version of our protocol w.h.p. performs a good-coloring in time O (^M ■ max |logn, ^^^|^ • We 
finally observe that, for the "realistic" case = ©(1) (see the discussion in the Introduction), 

the mixing-time bound M turns out to be 0(log n): we thus get only a logarithmic slowdown- factor 
w.r.t. the good-coloring in the dynamic Erdos-Renyi graph Q{n,p,q). 

- More Communities. The presence of a constant number r = 0(1) of unknown equally-sized 
communities can be managed with a similar method to that described above for removing the 
presence of leaders. Indeed, the major issue to cope with is the presence of a constant number 
of different color spreadings in each community and the protocol must select the right one in 
every community. However, if r is a constant and the number of nodes in each community is 
some constant fraction of n, then the impact of the presence of O(logn) colors in each of the 
r communities remains negligible till the overall number of colored nodes in each community is 
0(n/log^n). As in the previous paragraph, by first applying the minimal-color rule and then the 
majority one, the modified protocol returns a good-coloring w.h.p. 

- Sparse Graphs. When p = (i) and ^ = O (^) (for some constant b > 0), the snapshots of 



the dynamic graph are very sparse. So, every node must wait at least @ time step (in average) 
in order to meet some other node. This implies that the coloring protocol will be slower. We can 
reduce this case to the case p = 1/n hy considering the time-union random graph obtained from 
G{n,p,q) according to the following 

Definition 8 Let A be any positive integer and consider any sequence of graphs G{V, Ei), . . . 
..,G{V, E/\). Then, we define the A-OR-graph 




= (y, E^) where E^ 
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It is easy to prove the following 

Lemma 9 Let p < ^, then the ^-OR-graph of any finite sequence of graphs selected according to 
the Q{n,p, q) model is a Q{n,p, q) with p = @ (^) and q = O (^) . 

The modified protocol just works as it would work over Q{n,p,q) with p = Q (^) and q = O (^): 
in every phase, every node applies the phase's coloring rule (only) every A = ^ time steps on the 

^-OR-graph. The modified protocol thus requires G (Alog?i) = (^^^^^^ time. 

- Dense Graphs. When p becomes larger than log n/n and q = {-^), the coloring problem 
becomes an easier task since standard probability arguments easily show that the (good) coloring 
process is faster and the related random variables (i.e. number of new colored nodes at every time 
step) have much smaller variance. This implies that the protocol can be simplified: for instance, 
the source-coloring phase (i.e. Phase 1) can be skipped while the length of other phases can be 
reduced significantly as a function of p. However, we again emphasize that dense dynamic random 
random graphs are not a good model for the scenario we are inspired from: ICMNs are opportunistis 
networks having sparse and disconnected topology. 

4 Conclusions 

This paper introduces a framework that allows an analytical study of the distributed community- 
detection problem in dynamic graphs. Then, it shows an efficient algorithmic solution in two classes 
of such graphs that model some features of opportunistic networks such as ICMNs. We believe that 
the problem deserves to be studied in other classes of dynamic graphs that may capture further 
relevant features of social opportunistic networks. 

Acknowledgements. We thank Stefano Leucci for its help in getting an efficient protocol simu- 
lation over large random graphs. 
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A Experimental Results 



We run our protocol over sequences of independent random graphs according to the Q{n,p, q) model. 
The protocol has been suitably simplified and tuned in order to optimize the real performance. In 
particular, the implemented procotol consists of 5 Phases: Phase 1 (Source-Coloring), Phase 2-3 
(Fast-Coloring I-II), Phase 4 (Min-Coloring), and Phase 5 (Majority- Rule). The rules of each phase 
is the same of the corresponding phase analyzed in Section [2l Moreover, the length of every phase 
is fixed to clogn. As shown in the next tables, parameter c is always very small and it depends on 
the parameter q. The parameter c has been heuristically chosen as the minimal one yielding the 
good coloring in more than 98% of the trials. We consider instances of increasing size n and for 
each size, we tested 100 random graphs. In the first experiment class (see Table 1), we consider 
homogeneous sparse graphs with the following setting: p = ^ and 3 values of q ranging from 1/n^ 
to l/n^/^. 

Table 1: Tab. 1. Experimental results for the homogeneous case. For every value of n, the rows 
indicates the percentage of good-coloring for three choices of q and the "minimal" setting for c (the 
total number of Protocol' steps is inside brackets). 



n 


3 

q = n 2 , c = 0.9 


q = n 3 , c = 0.6 


q = n ^, c = 0.5 


20000 


99 (66) 


100 (46) 


100 (36) 


40000 


99 (71) 


100 (46) 


100 (41) 


80000 


100 (76) 


100 (51) 


100 (41) 


160000 


100 (81) 


100 (51) 


100 (46) 


320000 


100 (86) 


100 (56) 


99 (46) 


640000 


100 (91) 


100 (61) 


100 (51) 


1280000 


100 (91) 


100 (61) 


100 (51) 


2560000 


100 (96) 


100 (66) 


100 (56) 



The second class of experiments concerns non-homogeneous random graphs. For each pair of 
nodes e = {u, v) in the same community, the probability is randomly fixed in a range [di/n, d2/n] 
before starting the graph-sequence generation. Then, a every time step t ^ 0, the graph-snapshot 
G{V, Ef) is generated by selecting every edge e = {u, v) according to its birth-probability pe (the 
edges between the two communities are generated with parameter q). In Table 2, the experimental 
results are shown for the case di = 1 and (i2 = 9 in order to generate sparse topologies inside the 
communities, while in Table 3, the results concern the more dense case where di = and d2 = logn. 
The protocol's implementation is the same of the homogeneous case above. 

The experiments globally show that the tuning of parameter c mainly depends on the value 
of q even though it can be fixed to small values in all studied cases. Moreover, the presence of 
non-homogeneous edge-probability function seems to slightly "help" the efficiency of the protocol. 
Intuitively speaking, we believe this is due to the presence of fully-random irregularities in the 
graph topology that helps the protocol to break the symmetry of the initial configuration. 
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Table 2: Tab. 2. Experimental results for the non-homogeneous sparse case with di = 1 and 
d2 = 9. 



n 


y LfTL ' ,C — i 


„ 1 /„5/3 „ r\ A 


q — 1/71 ,C — U.4 


20000 


100 (46) 


100 (46) 


100 (36) 


40000 


98 (71) 


99 (46) 


100 (41) 


80000 


100 (76) 


100 (51) 


100 (41) 


160000 


100 (81) 


100 (51) 


100 (46) 


320000 


100 (86) 


100 (56) 


100 (46) 


640000 


100 (91) 


100 (61) 


100 (51) 


1280000 


100 (91) 


100 (61) 


100 (51) 



Table 3: Tab. 3. Experimental results for the non-homogeneous case with di = and d2 = logn. 



n 


3 

q = n 2 J c = 1 


5 

q = n 3 , c = 0.4 


q = n ^, c = 0.4 


20000 


99 (76) 


100 (31) 


100 (31) 


40000 


99 (81) 


100 (31) 


100 (31) 


80000 


98 (86) 


100 (31) 


100 (31) 


160000 


100 (91) 


100 (36) 


100 (36) 


320000 


100 (96) 


100 (36) 


100 (36) 


640000 


100 (101) 


100 (41) 


100 (41) 


1280000 


100 (106) 


100 (41) 


100 (41) 



B Useful Tools 

Lemma 10 If x = o(l) and xy = o(l) then 

(l-xY ^l-xy{l + 2x) 
(1 — xy ^1 — xy {1 — xy) 

We will often use the Chernoff 's bounds 

Lemma 11 (Chernoff 's Bound.) Let be X = Xi where Xi, . . . , Xn are independent Bernoulli 

random variables and let be < 6 < 1. If < fJ-i < E[X] and /i2 > E [X], then it holds that 

P{X < {I - 6)fii} < e-'-^f'K (17) 

P{X >{l + 6)fi2) <e-^''\ (18) 

Lemma 12 Let ip be any poly-logarithm and Eq, Ei, E^p be events that hold w.h.p., then 
£^0 n -El n ... n holds w.h.p. 
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